智能论文笔记

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Xiaowu Dai , Yuhua Zhu

分类：机器学习

2021-12-02

我们研究随机梯度下降（SGD）动态轨迹的统计特性。我们将Mini-Batch SGD和动量SGD视为随机微分方程（SDES）。我们利用了SDE的连续制定和Fokker-Planck方程的理论，在逃避现象和大批次和尖锐最小值的关系中开发新结果。特别是，我们发现随机过程解决方案倾向于会聚到渐渐的最小值，而无论渐近状态中的批量大小如何。但是，收敛速度严格被证明依赖于批量尺寸。这些结果经验验证了各种数据集和模型。

translated by 谷歌翻译

Orthogonalized Kernel Debiased Machine Learning for Multimodal Data Analysis

Xiaowu Dai , Lexin Li

分类： (统计)机器学习

2021-03-12

多式联运成像已转化神经科学研究。虽然它提出了前所未有的机会，但它也会冒着严峻的挑战。特别地，难以将归因于简单关联模型的解释性的优点与通过高度自适应非线性模型实现的灵活性组合。在本文中，我们提出了一个正交化的内核脱叠机器学习方法，该方法建立在奈曼正交性和一种分解正交性的形式，用于多模式数据分析。我们针对几乎所有多式化研究中自然出现的环境，其中有一个主要的兴趣模式，以及额外的辅助方式。我们建立了估计主要参数，半参数估计效率和预测的主要模型效应的置信带的渐近有效性的root-$ n $和渐近常态。我们的建议在很大程度上享有模型可解释性和模型灵活性。它与现有的多式联数据集成统计方法以及基于正交性的高维推论的方法也很大。我们通过模拟和应用于阿尔茨海默病的多模峰神经影像研究的应用，证明了我们的方法的功效。

translated by 谷歌翻译

Learning in Multi-Stage Decentralized Matching Markets

Xiaowu Dai , Michael I. Jordan

分类： (统计)机器学习

2021-02-13

匹配市场通常以多级和分散的方式组织。此外，现实世界匹配市场的参与者通常具有不确定的偏好。本文基于非参数统计方法和变分分析，开发了在这种环境中学习最佳策略的框架。我们提出了一种高效的算法，建立在“较低不确定性绑定”和“校准分散匹配”的概念上，以最大限度地提高参与者的预期收益。我们表明存在福利与公平性权衡，其特征在于接受的不确定性水平。参与者将战略性地提起低度不确定性水平，以减少竞争并增加预期的收益。与单阶段匹配相比，我们证明参与者可以更好地使用多级匹配。我们通过模拟和使用大学录取的真实数据的实验展示了理论预测的方面。

translated by 谷歌翻译

Learning Strategies in Decentralized Matching Markets under Uncertain Preferences

Xiaowu Dai , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2020-10-29

当代理偏好未知的先验时，我们研究了在共享资源的稀缺时决策的问题问题，并且必须从数据中学到。将双面匹配市场作为一个跑步的例子，我们专注于分散的环境，代理商不会与中央权威分享他们的学习偏好。我们的方法基于再生内核希尔伯特空间中的偏好的表示，以及偏好的学习算法，其由于市场代理商之间的竞争而占不确定性的偏好。在规律性条件下，我们表明我们的偏好估算器以极少的最佳速率收敛。考虑到这一结果，我们推出了最佳策略，最大化代理商的预期收益，我们通过考虑机会成本来校准不确定的状态。我们还获得了激励兼容性属性，并表明学习策略的结果具有稳定性。最后，我们证明了一个公平性质，称赞根据学到的策略存在没有合理的嫉妒。

translated by 谷歌翻译

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Xu Yan , Chaoda Zheng , Zhen Li , Shuguang Cui , Dengxin Dai

分类：计算机视觉

2023-01-03

When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.

translated by 谷歌翻译

Edge Enhanced Image Style Transfer via Transformers

Chiyu Zhang , Jun Yang , Zaiyan Dai , Peng Cao

分类：计算机视觉

2023-01-02

In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.

translated by 谷歌翻译

A Survey for In-context Learning

Qingxiu Dong , Lei Li , Damai Dai , Ce Zheng , Zhiyong Wu , Baobao Chang , Xu Sun , Jingjing Xu , Lei Li , Zhifang Sui

分类：自然语言处理 | 人工智能

2022-12-31

With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples. It has been a new trend exploring ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress, challenges, and future work in ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques of ICL, including training strategies, prompting strategies, and so on. Finally, we present the challenges of ICL and provide potential directions for further research. We hope our work can encourage more research on uncovering how ICL works and improving ICL in future work.

translated by 谷歌翻译

NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation

Pengwei Yin , Jiawu Dai , Jingjing Wang , Di Xie , Shiliang Pu

分类：计算机视觉

2022-12-30

Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.

translated by 谷歌翻译

Learning to mask: Towards generalized face forgery detection

Jianwei Fei , Yunshu Dai , Huaming Wang , Zhihua Xia

分类：计算机视觉

2022-12-29

Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.

translated by 谷歌翻译

Swin MAE: Masked Autoencoders for Small Datasets

Zi'an Xu , Yin Dai , Fayu Liu , Weibing Chen , Yue Liu , Lifu Shi , Sheng Liu , Yuhang Zhou

分类：计算机视觉 | 人工智能

2022-12-28

The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most of the current unsupervised learning methods need to be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, which is a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images and without using any pre-trained models, Swin MAE is still able to learn useful semantic features purely from images. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in terms of the transfer learning results of downstream tasks. The code will be publicly available soon.

translated by 谷歌翻译